{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Tensorflow Timeline Analysis on Model Zoo Benchmark between Intel optimized and stock Tensorflow\n",
    "\n",
    "This jupyter notebook will help you evaluate performance benefits from Intel-optimized Tensorflow on the level of Tensorflow operations via several pre-trained models from Intel Model Zoo. The notebook will show users a bar chart like the picture below for the Tensorflow operation level performance comparison. The red horizontal line represents the performance of Tensorflow operations from Stock Tensorflow, and the blue bars represent the speedup of Intel Tensorflow operations. The operations marked as \"mkl-True\" are accelerated by oneDNN a.k.a MKL-DNN, and users should be able to see a good speedup for those operations accelerated by oneDNN. \n",
    "> NOTE : Users need to get Tensorflow timeline json files from other Jupyter notebooks like benchmark_perf_comparison\n",
    "  first to proceed this Jupyter notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> NOTE: Users could also compare elapsed time of TF ops among any two different TF timeline files.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images\\compared_tf_op_duration_ratio_bar.png\" width=\"700\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The notebook will also show users two pie charts like the picture below for elapsed time percentage among different Tensorflow operations.   \n",
    "Users can easily find the Tensorflow operation hotspots in these pie charts among Stock and Intel Tensorflow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images\\compared_tf_op_duration_pie.png\" width=\"700\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get Platform Information "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ignore all warning messages\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from profiling.profile_utils import PlatformUtils\n",
    "plat_utils = PlatformUtils()\n",
    "plat_utils.dump_platform_info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Section 1: TensorFlow Timeline Analysis\n",
    "## Prerequisites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install cxxfilt\n",
    "\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_rows', 500)\n",
    "pd.set_option('display.max_columns', 500)\n",
    "pd.set_option('display.width', 1500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List out the Timeline folders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, list out all Timeline folders from previous runs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "filenames= os.listdir (\".\") \n",
    "result = []\n",
    "keyword = \"Timeline\"\n",
    "for filename in filenames: \n",
    "    if os.path.isdir(os.path.join(os.path.abspath(\".\"), filename)): \n",
    "        if filename.find(keyword) != -1:\n",
    "                result.append(filename)\n",
    "result.sort()\n",
    "\n",
    "index =0 \n",
    "for folder in result:\n",
    "    print(\" %d : %s \" %(index, folder))\n",
    "    index+=1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select a Timeline folder from previous runs\n",
    "#### ACTION: Please select one Timeline folder and change FdIndex accordingly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use the \"FD_INDEX\" environment variable value if it exists.\n",
    "import os\n",
    "env_fd_index=os.environ.get('FD_INDEX', '')\n",
    "if env_fd_index != '':\n",
    "    FdIndex= int(env_fd_index)\n",
    "else:\n",
    "    ## USER INPUT\n",
    "    FdIndex= int(input('Input a index number of a folder: '))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "List out all Timeline json files inside Timeline folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "TimelineFd = result[FdIndex]\n",
    "print(TimelineFd)\n",
    "datafiles = [TimelineFd +os.sep+ x for x in os.listdir(TimelineFd) if '.json' == x[-5:]]\n",
    "print(datafiles)\n",
    "if len(datafiles) is 0:\n",
    "    print(\"ERROR! No json file in the selected folder. Please select other folder.\")\n",
    "elif len(datafiles) is 1:\n",
    "    print(\"WARNING! There is only 1 json file in the selected folder. Please select other folder to proceed Section 1.2.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Users can bypass below Section 1.1 and analyze performance among Stock and Intel TF by clicking the link : [Section 1_2](#section_1_2).**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='section_1_1'></a>\n",
    "## Section 1.1: Performance Analysis for one TF Timeline result\n",
    "### Step 1: Pick one of the Timeline files\n",
    "#### List out all the Timeline files first\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index = 0\n",
    "for file in datafiles:\n",
    "    print(\" %d : %s \" %(index, file))\n",
    "    index+=1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ACTION: Please select one timeline json file and change file_index accordingly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## USER INPUT\n",
    "# use the \"FILE_INDEX\" environment variable value if it exists.\n",
    "import os\n",
    "env_file_index=os.environ.get('FILE_INDEX', '')\n",
    "if env_file_index != '':\n",
    "    file_index= int(env_file_index)\n",
    "else:\n",
    "    ## USER INPUT\n",
    "    file_index= int(input('Input a index number of a file: '))\n",
    "\n",
    "fn = datafiles[file_index]\n",
    "tfile_prefix = fn.split('_')[0]\n",
    "tfile_postfix = fn.strip(tfile_prefix)[1:]\n",
    "fn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Parse timeline into pandas format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from profiling.profile_utils import TFTimelinePresenter\n",
    "tfp = TFTimelinePresenter(True)\n",
    "timeline_pd = tfp.postprocess_timeline(tfp.read_timeline(fn))\n",
    "timeline_pd = timeline_pd[timeline_pd['ph'] == 'X']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Sum up the elapsed time of each TF operation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfp.get_tf_ops_time(timeline_pd,fn,tfile_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Draw a bar chart for elapsed time of TF ops "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename= tfile_prefix +'_tf_op_duration_bar.png'\n",
    "title_=tfile_prefix +'TF : op duration bar chart'\n",
    "ax=tfp.summarize_barh(timeline_pd, 'arg_op', title=title_, topk=50, logx=True, figsize=(10,10))\n",
    "tfp.show(ax,'bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Draw a pie chart for total time percentage of TF ops "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename= tfile_prefix +'_tf_op_duration_pie.png'\n",
    "title_=tfile_prefix +'TF : op duration pie chart'\n",
    "timeline_pd_known = timeline_pd[ ~timeline_pd['arg_op'].str.contains('unknown') ]\n",
    "ax=tfp.summarize_pie(timeline_pd_known, 'arg_op', title=title_, topk=50, logx=True, figsize=(10,10))\n",
    "tfp.show(ax,'pie')\n",
    "ax.figure.savefig(filename,bbox_inches='tight')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='section_1_2'></a>\n",
    "## Section 1.2: Analyze TF Timeline results between Stock and Intel Tensorflow\n",
    "> NOTE : Users could also compare elapsed time of TF ops among any two different TF timeline files.\n",
    "\n",
    "### Speedup from oneDNN among different TF operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Select  one Intel and one Stock TF timeline files for analysis\n",
    "> NOTE: Users could also pick any two different TF timeline files.\n",
    "\n",
    "#### List out all timeline files in the selected folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if len(datafiles) is 1:\n",
    "    print(\"ERROR! There is only 1 json file in the selected folder.\")\n",
    "    print(\"Please select other Timeline folder from beginnning to proceed Section 1.2.\")\n",
    "\n",
    "for i in range(len(datafiles)):\n",
    "    print(\" %d : %s \" %(i, datafiles[i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ACTION: Please select one timeline file as a perfomance baseline and the other as a comparison target\n",
    "put the related index for your selected timeline file.\n",
    "In general, please put stock_timeline_xxxxx as the baseline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# perfomance baseline \n",
    "# use the \"BASELINE_INDEX\" environment variable value if it exists.\n",
    "import os\n",
    "env_baseline_index=os.environ.get('BASELINE_INDEX', '')\n",
    "if env_baseline_index != '':\n",
    "    Baseline_Index= int(env_baseline_index)\n",
    "else:\n",
    "    ## USER INPUT\n",
    "    Baseline_Index= int(input('Input a index number of a Performance Baseline: '))\n",
    "# comparison target\n",
    "Comparison_Index = 0 if Baseline_Index else 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### List out two selected timeline files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "selected_datafiles = []\n",
    "selected_datafiles.append(datafiles[Baseline_Index])\n",
    "selected_datafiles.append(datafiles[Comparison_Index])\n",
    "print(selected_datafiles)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Parsing timeline results into CSV files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib agg\n",
    "from profiling.profile_utils import TFTimelinePresenter\n",
    "csvfiles=[]\n",
    "percentage_filename = ''\n",
    "\n",
    "tfp = TFTimelinePresenter(True)\n",
    "for fn in selected_datafiles:\n",
    "    if fn.find('/'):\n",
    "        fn_nofd=fn.split('/')[1]\n",
    "    else:\n",
    "        fn_nofd=fn\n",
    "    tfile_name= fn_nofd.split('.')[0]\n",
    "    tfile_prefix = fn_nofd.split('_')[0]\n",
    "    tfile_postfix = fn_nofd.strip(tfile_prefix)[1:]\n",
    "    csvpath = TimelineFd +os.sep+tfile_name+'.csv'\n",
    "    print(csvpath)\n",
    "    csvfiles.append(csvpath)\n",
    "    timeline_pd = tfp.postprocess_timeline(tfp.read_timeline(fn))\n",
    "    timeline_pd = timeline_pd[timeline_pd['ph'] == 'X']\n",
    "    sitems, percentage_filename = tfp.get_tf_ops_time(timeline_pd,fn,tfile_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of  oneDNN operations from Intel TF "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "if percentage_filename != '':\n",
    "    print(percentage_filename)\n",
    "    tfp.plot_pie_chart(percentage_filename, 'mkl_percentage')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Pre-processing for the two CSV files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "csvarray = []\n",
    "csvfilenames= []\n",
    "for csvf in csvfiles:\n",
    "    print(\"read into pandas :\",csvf)\n",
    "    a = pd.read_csv(csvf)\n",
    "    csvarray.append(a)\n",
    "    if csvf.find(os.sep) > 0:\n",
    "        csvfilenames.append(csvf.split(os.sep)[-1])\n",
    "    else:\n",
    "        csvfilenames.append(csvf)\n",
    "\n",
    "a = csvarray[0]\n",
    "b = csvarray[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find tags among CSV files\n",
    "tags=[]\n",
    "from profiling.profile_utils import PerfPresenter\n",
    "perfp=PerfPresenter()\n",
    "tag0, tag1 = perfp.get_diff_from_csv_filenames(csvfilenames[0][:-4],csvfilenames[1][:-4])\n",
    "tags = [tag0, tag1]\n",
    "print('tags : ',tags)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Merge two CSV files and caculate the speedup accordingly"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Merge two csv files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "fdir='merged'\n",
    "if not os.path.exists(fdir):\n",
    "    os.mkdir(fdir)\n",
    "\n",
    "fpaths=[]\n",
    "fpaths.append(fdir+os.sep+'merged.csv')\n",
    "fpaths.append(fdir+os.sep+'diff_'+tags[0]+'.csv')\n",
    "fpaths.append(fdir+os.sep+'diff_'+tags[1]+'.csv')\n",
    "#merged=tfp.merge_two_csv_files(fpath,a,b)\n",
    "merged=tfp.merge_two_csv_files_v2(fpaths, a, b, tags)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare common operations among those two csv files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Compare common operations between \", tags)\n",
    "merged_df = pd.read_csv(fpaths[0])\n",
    "merged_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The unique Tensorflow operations from the first csv/Timline file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "print(\"Operations are only in\", tags[0], \" run\")\n",
    "extra1 = pd.read_csv(fpaths[1])\n",
    "extra1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The unique Tensorflow operations from the second csv/Timline file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Operations are only in\", tags[1], \" run\")\n",
    "extra2 = pd.read_csv(fpaths[2])\n",
    "extra2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Draw a bar chart for elapsed time of common TF ops among stock TF and Intel TF"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> NOTE: Users could also compare elapsed time of TF ops among any two different TF timeline files.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "print(fpaths[0])\n",
    "tfp.plot_compare_bar_charts(fpaths[0], tags=tags)\n",
    "tfp.plot_compare_ratio_bar_charts(fpaths[0], tags=['','oneDNN ops'], max_speedup=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6: Draw pie charts for elapsed time of TF ops among stock TF and Intel TF"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> NOTE: Users could also compare elapsed time of TF ops among any two different TF timeline files.\n",
    "\n",
    "We will have following pie charts in sequence:\n",
    "1. the pie chart for elpased time of TF ops from stock TF or the first csv/Timeline file\n",
    "2. the pie chart for elpased time of unique TF ops from stock TF or the first csv/Timeline file\n",
    "3. the pie chart for elpased time of TF ops from Intel TF or the second csv/Timeline file\n",
    "4. the pie chart for elpased time of unique TF ops from Intel TF or the second csv/Timeline file\n",
    "5. the pie chart for elpased time of common TF ops among stock & Intel TF or two csv/Timeline files\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of TF ops from Stock TF or the first csv/Timline file\n",
    "understand which TF operations spend most of time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfp.plot_pie_chart(csvfiles[0], tags[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of  unique TF operations from Stock TF or the first csv/Timline file\n",
    "understand if there is any unique TF operation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfp.plot_pie_chart(fpaths[1], tags[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of TF ops from Intel TF or the second csv/Timline file\n",
    "understand which TF operations spend most of time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfp.plot_pie_chart(csvfiles[1], tags[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of  unique TF operations from Intel TF or the seond csv/Timline file\n",
    "understand if there is any unique TF operation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfp.plot_pie_chart(fpaths[2], tags[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The pie chart for elapsed time of common TF ops among Stock & Intel TF or  two csv/Timline files\n",
    "understand top hotspots differences among Stock & Intel TF or two csv/Timeline files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "tfp.plot_compare_pie_charts(fpaths[0], tags=tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "intel-tensorflow",
   "language": "python",
   "name": "intel-tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
